Search results for "Multi-core processor"

showing 10 items of 35 documents

SPECTR

2018

Modern high throughput sequencing platforms can produce large amounts of short read DNA data at low cost. Error correction is an important but time-consuming initial step when processing this data in order to improve the quality of downstream analyses. In this paper, we present a Scalable Parallel Error CorrecToR designed to improve the throughput of DNA error correction for Illumina reads on various parallel platforms. Our design is based on a k-spectrum approach where a Bloom filter is frequently probed as a key operation and is optimized towards AVX-512-based multi-core CPUs, Xeon Phi many-cores (both KNC and KNL), and heterogeneous compute clusters. A number of architecture-specific opt…

0301 basic medicine03 medical and health sciencesMulti-core processor030104 developmental biologySpeedupXeonComputer scienceData structure alignmentParallel computingError detection and correctionSupercomputerThroughput (business)Xeon PhiProceedings of the 47th International Conference on Parallel Processing

researchProduct

2016

The growth of next-generation sequencing (NGS) datasets poses a challenge to the alignment of reads to reference genomes in terms of alignment quality and execution speed. Some available aligners have been shown to obtain high quality mappings at the expense of long execution times. Finding fast yet accurate software solutions is of high importance to research, since availability and size of NGS datasets continue to increase. In this work we present an efficient parallelization approach for NGS short-read alignment on multi-core clusters. Our approach takes advantage of a distributed shared memory programming model based on the new UPC++ language. Experimental results using the CUSHAW3 alig…

0301 basic medicinePhysics020203 distributed computingMulti-core processorDistributed shared memoryMultidisciplinarySource codemedia_common.quotation_subjectNode (networking)02 engineering and technologyDynamic priority schedulingParallel computingBioinformatics03 medical and health sciences030104 developmental biologyScalability0202 electrical engineering electronic engineering information engineeringProgramming paradigmPartitioned global address spacemedia_commonPLOS ONE

researchProduct

Parallel and Space-Efficient Construction of Burrows-Wheeler Transform and Suffix Array for Big Genome Data

2016

Next-generation sequencing technologies have led to the sequencing of more and more genomes, propelling related research into the era of big data. In this paper, we present ParaBWT, a parallelized Burrows-Wheeler transform (BWT) and suffix array construction algorithm for big genome data. In ParaBWT, we have investigated a progressive construction approach to constructing the BWT of single genome sequences in linear space complexity, but with a small constant factor. This approach has been further parallelized using multi-threading based on a master-slave coprocessing model. After gaining the BWT, the suffix array is constructed in a memory-efficient manner. The performance of ParaBWT has b…

0301 basic medicineTheoretical computer scienceBurrows–Wheeler transformComputer scienceGenomicsData_CODINGANDINFORMATIONTHEORYParallel computingGenomelaw.invention03 medical and health scienceslawGeneticsHumansEnsemblMulti-core processorApplied MathematicsLinear spaceSuffix arrayChromosome MappingHigh-Throughput Nucleotide SequencingGenomicsSequence Analysis DNA030104 developmental biologyAlgorithmsBiotechnologyReference genomeIEEE/ACM Transactions on Computational Biology and Bioinformatics

researchProduct

mD3DOCKxb: An Ultra-Scalable CPU-MIC Coordinated Virtual Screening Framework

2017

Molecular docking is an important method in computational drug discovery. In large-scale virtual screening, millions of small drug-like molecules (chemical compounds) are compared against a designated target protein (receptor). Depending on the utilized docking algorithm for screening, this can take several weeks on conventional HPC systems. However, for certain applications including large-scale screening tasks for newly emerging infectious diseases such high runtimes can be highly prohibitive. In this paper, we investigate how the massively parallel neo-heterogeneous architecture of Tianhe-2 Supercomputer consisting of thousands of nodes comprising CPUs and MIC coprocessors that can effic…

0301 basic medicineVirtual screeningMulti-core processorCoprocessorComputer sciencebusiness.industryParallel computingSupercomputer03 medical and health sciences030104 developmental biologyEmbedded systemScalabilityTianhe-2Algorithm designbusinessMassively parallel2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)

researchProduct

Extending PluTo for Multiple Devices by Integrating OpenACC

2018

For many years now, processor vendors increased the performance of their devices by adding more cores and wider vectorization units to their CPUs instead of scaling up the processors' clock frequency. Moreover, GPUs became popular for solving problems with even more parallel compute power. To exploit the full potential of modern compute devices, specific codes are necessary which are often coded in a hardware-specific manner. Usually, the codes for CPUs are not usable for GPUs and vice versa. The programming API OpenACC tries to close this gap by enabling one code-base to be suitable and optimized for many devices. Nevertheless, OpenACC is rarely used by `standard programmers' and while dif…

060201 languages & linguisticsMulti-core processorExploitComputer scienceClock rate06 humanities and the arts02 engineering and technologyParallel computingUSablecomputer.software_genrePluto0602 languages and literature0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processingCompilercomputer2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)

researchProduct

Practical considerations for acoustic source localization in the IoT era: Platforms, energy efficiency, and performance

2019

The rapid development of the Internet of Things (IoT) has posed important changes in the way emerging acoustic signal processing applications are conceived. While traditional acoustic processing applications have been developed taking into account high-throughput computing platforms equipped with expensive multichannel audio interfaces, the IoT paradigm is demanding the use of more flexible and energy-efficient systems. In this context, algorithms for source localization and ranging in wireless acoustic sensor networks can be considered an enabling technology for many IoT-based environments, including security, industrial, and health-care applications. This paper is aimed at evaluating impo…

Computer Networks and CommunicationsComputer scienceDistributed computingContext (language use)02 engineering and technologyParallel architectures0202 electrical engineering electronic engineering information engineeringParallel processingWirelessSignal processingMulti-core processorHeterogeneous (hybrid) systemsbusiness.industry020206 networking & telecommunicationsAcoustic source localizationWireless acoustic sensor networks (WASNs)Computer Science ApplicationsEnergy efficiencyHardware and ArchitectureSignal Processing020201 artificial intelligence & image processingElectrónicabusinessWireless sensor networkSource localizationInformation SystemsEfficient energy useAcoustic signal processing

researchProduct

Domain-Knowledge Optimized Simulated Annealing for Network-on-Chip Application Mapping

2013

Network-on-Chip architectures are scalable on-chip interconnection networks. They replace the inefficient shared buses and are suitable for multicore and manycore systems. This paper presents an Optimized Simulated Annealing (OSA) algorithm for the Network-on-Chip application mapping problem. With OSA, the cores are implicitly and dynamically clustered using knowledge about communication demands. We show that OSA is a more feasible Simulated Annealing approach to NoC application mapping by comparing it with a general Simulated Annealing algorithm and a Branch and Bound algorithm, too. Using real applications we show that OSA is significantly faster than a general Simulated Annealing, withou…

Computer Science::Hardware ArchitectureInterconnectionMulti-core processorNetwork on a chipBranch and boundComputer scienceScalabilitySimulated annealingComputer Science::Networking and Internet ArchitectureParallel computingAdaptive simulated annealingCluster analysis

researchProduct

Performance improvements of real-time crowd simulations

2010

The current challenge for crowd simulations is the design and development of a scalable system that is capable of simulating the individual behavior of millions of complex agents populating large scale virtual worlds with a good frame rate. In order to overcome this challenge, this thesis proposes different improvements for crowd simulations. Concretely, we propose a distributed software architecture that can take advantage of the existing distributed and multi-core architectures. In turn, the use of these distributed architectures requires partitioning strategies and workload balancing techniques for distributed crowd simulations. Also, these architectures allow the use of GPUs not only fo…

Computer graphicsMulti-core processorComputer scienceVirtual machineSoftware agentServerDistributed computingReal-time computingcomputer.software_genreSoftware architectureMetaversecomputerRendering (computer graphics)2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)

researchProduct

Real-time computation of parameter fitting and image reconstruction using graphical processing units

2016

Abstract In recent years graphical processing units (GPUs) have become a powerful tool in scientific computing. Their potential to speed up highly parallel applications brings the power of high performance computing to a wider range of users. However, programming these devices and integrating their use in existing applications is still a challenging task. In this paper we examined the potential of GPUs for two different applications. The first application, created at Paul Scherrer Institut (PSI), is used for parameter fitting during data analysis of μ SR (muon spin rotation, relaxation and resonance) experiments. The second application, developed at ETH, is used for PET (Positron Emission T…

FOS: Computer and information sciencesMulti-core processorSpeedup010308 nuclear & particles physicsComputer scienceComputationFOS: Physical sciencesGeneral Physics and AstronomyIterative reconstructionComputational Physics (physics.comp-ph)Supercomputer01 natural sciences030218 nuclear medicine & medical imagingComputational science03 medical and health sciencesRange (mathematics)CUDA0302 clinical medicineComputer Science - Distributed Parallel and Cluster ComputingHardware and Architecture0103 physical sciencesSingle-coreDistributed Parallel and Cluster Computing (cs.DC)Physics - Computational PhysicsComputer Physics Communications

researchProduct

Improving Collective I/O Performance Using Non-volatile Memory Devices

2016

Collective I/O is a parallel I/O technique designed to deliver high performance data access to scientific applications running on high-end computing clusters. In collective I/O, write performance is highly dependent upon the storage system response time and limited by the slowest writer. The storage system response time in conjunction with the need for global synchronisation, required during every round of data exchange and write, severely impacts collective I/O performance. Future Exascale systems will have an increasing number of processor cores, while the number of storage servers will remain relatively small. Therefore, the storage system concurrency level will further increase, worseni…

Input/outputFile system020203 distributed computingMulti-core processorbusiness.industryComputer scienceConcurrency020206 networking & telecommunications02 engineering and technologycomputer.software_genreSupercomputerNon-volatile memoryMemory managementData accessServerComputer data storage0202 electrical engineering electronic engineering information engineeringbusinesscomputerComputer network2016 IEEE International Conference on Cluster Computing (CLUSTER)

researchProduct